On the Complexity of Optimal Microaggregation for Statistical Disclosure Control
نویسندگان
چکیده
Statistical disclosure control (SDC), also termed inference control two decades ago, is an integral part of data security dealing with the protection of statistical databases. The basic problem in SDC is to release data in a way that does not lead to disclosure of individual information (high security) but preserves the informational content as much as possible (low information loss). SDC is dual with data mining in that progress of data mining techniques forces official statistics to a continual improvement of SDC techniques: the more powerful the inferences that can be made on a released data set, the more protection is needed so that no inference jeopardizes the privacy of individual respondents’ numerical data. This paper deals with the computational complexity of optimal microaggregation, where optimal means yielding minimal information loss for a fixed security level. More specifically, we show that the problem of optimal microaggregation cannot be exactly solved in polynomial time. This result is relevant because it provides theoretical justification for the lack of exact optimal algorithms and for the current use of heuristic approaches.
منابع مشابه
Improved Univariate Microaggregation for Integer Values
Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...
متن کاملA polynomial-time approximation to optimal multivariate microaggregation
Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying individuals. Microaggregation techniques are currently being used by many statistical agencies. The principle of microaggregation is to group o...
متن کاملRecord Ordering Heuristics for Disclosure Control through Microaggregation
Statistical disclosure control (SDC) methods reconcile the need to release information to researchers with the need to protect privacy of individual records. Microaggregation is a SDC method that protects data subjects by guarantying k-anonymity: Records are partitioned into groups of size at least k and actual data values are replaced by the group means so that each record in the group is indi...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملTFRP: An efficient microaggregation algorithm for statistical disclosure control
Recently, the issue of Statistic Disclosure Control (SDC) has attracted much attention. SDC is a very important part of data security dealing with the protection of databases. Microaggregation for SDC techniques is widely used to protect confidentiality in statistical databases released for public use. The basic problem of microaggregation is that similar records are clustered into groups, and ...
متن کامل